PAN at FIRE: Overview of the PR-SOCO Track on Personality Recognition in SOurce COde

نویسندگان

  • Francisco M. Rangel Pardo
  • Fabio A. González
  • Felipe Restrepo-Calle
  • Manuel Montes-y-Gómez
  • Paolo Rosso
چکیده

Author profiling consists of predicting some author’s characteristics (e.g. age, gender, personality) from her writing. After addressing at PAN@CLEF mainly age and gender identification, and also personality recognition in Twitter, in this PAN@FIRE track on Personality Recognition from SOurce COde (PR-SOCO) we have addressed the problem of predicting author’s personality traits from her source code. In this paper, we analyse 48 runs sent by 11 participant teams. Given a set of source codes written in Java by students who answered also a personality test, participants had to predict personality traits, based on the big five model. Results have been evaluated with two complementary measures (RMSE and Pearson product-moment correlation) that have permitted to identify whether systems with low error rates may work due to random chance. No matter the approach, openness to experience is the trait where the participants obtained the best results for both measures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PRHLT at PR-SOCO: A Regression Model for Predicting Personality Traits from Source Code

This paper describes our participation in the PAN@FIRE Personality Recognition in Source Code (PR-SOCO) 2016 shared task. We have proposed two different approaches to tackle this task, on the one hand, each code sample from each author was taken as an independent sample and it was vectorized using word n-grams; on the other hand, all the code from an author was taken as a unique sample, and it ...

متن کامل

Personality Recognition in Source Code Working Note: Team BESUMich

In this paper, we describe the results of source code personality identification from Team BESUMich. We used a set of simple, robust, scalable, and language-independent features on the PR-SOCO dataset. Using leave-one-coder-out strategy, we obtained minimum RMSE on the test data for extroversion, and competitive results for other personality traits.

متن کامل

Personality Recognition Applying Machine Learning Techniques on Source Code Metrics

Source code has become a data source of interest in the recent years. In the software industry is common the extraction of source code metrics, mainly for quality assurance purposes. In this paper source code metrics are used to consolidate programmers profiles with the purpose to identify different personality traits using machine learning algorithms. This work was done as part of the Personal...

متن کامل

PAN@FIRE: Overview of CL-SOCO Track on the Detection of Cross-Language SOurce COde Re-use

The detection of source code re-use is an important research field for both software industry and academia fields. This paper summarizes the goals, organization and results of the second SOCO competitive evaluation campaign for systems that automatically detect the source code re-use phenomenon. PAN@FIRE shared task, named Cross-Language SOurce COde Re-use (CL-SOCO), focused on the detection of...

متن کامل

Shallow Recurrent Neural Network for Personality Recognition in Source Code

Personality recognition in source code constitutes a novel task in the field of author profiling on written text. In this paper we describe our proposal for the PR-SOCO shared task in FIRE 2016, which is based on a shallow recurrent LSTM neural network that tries to predict five personality traits of the author given a source code fragment. Our preliminary results show that it should be possibl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016